Methods for efficient semi-automatic pronunciation dictionary bootstrapping

نویسندگان

Tim Schlippe

Matthias Merz

Tanja Schultz

چکیده

In this paper we propose efficient methods which contribute to a rapid and economic semi-automatic pronunciation dictionary development and evaluate them on English, German, Spanish, Vietnamese, Swahili, and Haitian Creole. First we determine optimal strategies for the word selection and the period for the grapheme-to-phoneme model retraining. In addition to the traditional concatenation of single phonemes most commonly associated with each grapheme, we show that web-derived pronunciations and cross-ligual grapheme-to-phoneme models can help to reduce the initial editing effort. Furthermore we show that our phoneme-level combination of the output of multiple grapheme-to-phoneme converters reduces the editing effort more than the best single converters. Totally, we report on average 15% relative editing effort reduction with our phonemelevel combination compared to conventional methods. An additional reduction of 6% relative was possible by including initial pronunciations from Wiktionary for English, German, and Spanish.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrapping pronunciation dictionaries: practical issues

Bootstrapping techniques are an efficient way to develop electronic pronunciation dictionaries [1, 2], but require fast system response to be practical for medium-to-large lexicons. In addition, user errors are inevitable during this process, and it is useful if automatic means can be used to assist in the search for such errors. We describe how the Default&Refine grapheme-tophoneme rule extrac...

متن کامل

The efficient generation of pronunciation dictionaries: human factors during bootstrapping

Bootstrapping techniques have significant potential for the efficient generation of linguistic resources such as electronic pronunciation dictionaries. We describe a system and an approach to bootstrapping for the development of such dictionaries, and report on experiments conducted to investigate the efficiency and effectiveness of the system, focusing on the human factors that influence the p...

متن کامل

Semi-supervised G2p bootstrapping and its application to ASR for a very under-resourced language: Iban

This paper describes our experiments and results on using a local dominant language in Malaysia (Malay), to bootstrap automatic speech recognition (ASR) for a very under-resourced language: Iban (also spoken in Malaysia on the Borneo Island part). Resources in Iban for building a speech recognition were nonexistent. For this, we tried to take advantage of a language from the same family with se...

متن کامل

The efficient generation of pronunciation dictionaries: machine learning factors during bootstrapping

Several factors affect the efficiency of bootstrapping approaches to the generation of pronunciation dictionaries. We focus on factors related to the underlying rule-extraction algorithms, and demonstrate variants of the Dynamically Expanding Context algorithm, which are beneficial for this application. In particular, we show that continuous updating of the learned rules, coupled with a new app...

متن کامل

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Methods for efficient semi-automatic pronunciation dictionary bootstrapping

نویسندگان

چکیده

منابع مشابه

Bootstrapping pronunciation dictionaries: practical issues

The efficient generation of pronunciation dictionaries: human factors during bootstrapping

Semi-supervised G2p bootstrapping and its application to ASR for a very under-resourced language: Iban

The efficient generation of pronunciation dictionaries: machine learning factors during bootstrapping

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks

عنوان ژورنال:

اشتراک گذاری